† Corresponding author. E-mail:
Project support by the National Key Research and Development Program of China (Grant No. 2018YFF0301000) and the National Natural Science Foundation of China (Grant Nos. 71673161 and 71790613).
In complex networks, identifying influential spreader is of great significance for improving the reliability of networks and ensuring the safe and effective operation of networks. Nowadays, it is widely used in power networks, aviation networks, computer networks, and social networks, and so on. Traditional centrality methods mainly include degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, k-shell, etc. However, single centrality method is one-sided and inaccurate, and sometimes many nodes have the same centrality value, namely the same ranking result, which makes it difficult to distinguish between nodes. According to several classical methods of identifying influential nodes, in this paper we propose a novel method that is more full-scaled and universally applicable. Taken into account in this method are several aspects of node’s properties, including local topological characteristics, central location of nodes, propagation characteristics, and properties of neighbor nodes. In view of the idea of the multi-attribute decision-making, we regard the basic centrality method as node’s attribute and use the entropy weight method to weigh different attributes, and obtain node’s combined centrality. Then, the combined centrality is applied to the gravity law to comprehensively identify influential nodes in networks. Finally, the classical susceptible-infected-recovered (SIR) model is used to simulate the epidemic spreading in six real-society networks. Our proposed method not only considers the four topological properties of nodes, but also emphasizes the influence of neighbor nodes from the aspect of gravity. It is proved that the new method can effectively overcome the disadvantages of single centrality method and increase the accuracy of identifying influential nodes, which is of great significance for monitoring and controlling the complex networks.
With the development of science and technology and the progress of society, various networks are gradually formed, such as aviation networks,[1] traffic networks,[2] computer networks,[3] social networks,[4] and biological networks,[5] which are closely related to our life and work. In recent decades, the researches on complex networks have gradually attracted the attention of scholars from all walks of life. It is found that in many real-society networks, different nodes have different effects on the networks.[6–8] Therefore, identifying influential nodes is of great significance for stably operating and effectively controlling the networks, such as finding social leader,[9] mitigating disease spreading,[10] controlling information dissemination,[11,12] detecting community structures,[13] etc., and it has become one of the research hotspots of complex networks.
In the past researches, scientists have proposed a series of centrality methods, such as the degree centrality (DC),[14,15] betweenness centrality (BC),[16,17] closeness centrality (CC),[16] eigenvector centrality (EC),[18] k-shell,[7] PageRank (PR),[19] H-index algorithm,[19] etc. However, sometimes these centrality methods are one-sided and inaccurate in identifying influential nodes in complex networks. For example, degree centrality only considers the local information about nodes, regardless of the role of neighbor nodes; the CC and BC focus on the shortest path, but information is not always transmitted through the shortest path, and the time complexity is high; when the degree of nodes is large, the EC has the phenomenon of local centralization of numerical value; The k-shell and H-index algorithm cannot distinguish between nodes with the same centrality value; PR ignores the situation of the page itself, and does not distinguish between the types of links in the page.
In order to improve the accuracy of identifying influential nodes, researchers have also proposed a great many of improved centrality methods. For instance, Wen and Deng[20] proposed a new method based on the local information dimensionality (LID) of nodes, in which the quasi-local information around each node is considered. Fei et al.[21] proposed a novel method based on the inverse-square law by defining the intensity of node. Gao et al.[22] proposed a local structural centrality measuring method by considering the number of nearest neighbors of a node and the topological connections among the neighbors. By taking into account the neighbors’ resource and the influence of spreading rate for the target node, Zhong et al.[23] proposed an improved iterative resource allocation (IIRA) method to identify influential nodes. Wang et al.[24] combined degree and weight strength of each node and proposed a modified efficiency centrality (EffCm) in weighted network. Zeng and Zhang[25] proposed a mixed degree decomposition (MDD) method by incorporating the residual degree and the exhausted degree to revise the original k-shell decomposition, but it is difficult to find the optimal parameter λ to achieve better results. Instead of ranking all nodes, Song et al.[26] aimed at ranking a small number of nodes in networks and proposed a rapid identifying method (RIM) to find the fraction of high-influential nodes.
Recently, based on the multiple attribute decision making (MADM),[14,27–30] Mo and Deng[31] proposed a multi-evidence centrality (MeC) method from the perspective of multi-attributes. Bian et al.[32] considered several different centrality methods as the multiple attributes and proposed a method to identify influential nodes based on the analytic hierarchy process (AHP). Hu et al.[33] proposed a weighted technique to identify influential nodes based on the technique for order preference by similarity to an ideal solution (TOPSIS) by calculating the weight of each attribute. To further explain the influence of neighbor nodes, Li et al.[34] proposed a gravity model in which both neighborhood information and path information are used to measure a node’s influence. By viewing the k-shell value of each node as its mass and the shortest path distance between two nodes as their distance, according to the gravity formula, Ma et al.[6] proposed a gravity centrality index to identify the influential spreaders.
Based on the above researches, it can be found that a common shortcoming in these centrality methods is that in most of methods only single centrality characteristic is considered, or the weight of every characteristic is regarded as being identical when multiple characteristics are considered, which may reduce the accuracy of ranking nodes. Another obvious shortcoming is that the influence of a node depends not only on its direct neighbors (1-step neighbors), but also on its 2-step and even more steps of neighbors. So to solve these issues, we need to do further research on identifying influential nodes. In view of the multi-attribute decision-making, we propose an improved centrality method in which taken into account are several aspects of node’s properties, including local topological characteristics, central location of nodes, propagation characteristics, and properties of neighbor nodes. We regard the classical centrality methods as the attributes of nodes. Considering the different effects of different attributes on nodes, we use entropy weight method,[14,28,29] which is a method to calculate the weight quantitatively, to weigh different attributes and obtain the combined centrality. Then the gravity law[6,34,35] is utilized to evaluate the influence of nodes. Finally, based on the real complex networks, the classical SIR[36] model is used to simulate the spreading process and verity the evaluation results. The results show that the proposed method is feasible and accurate in identifying influential nodes.
With the increasing variety and quantity of complex networks in society, there is an urgent need for some efficient, accurate and universally applicable centrality methods. Based on our proposed method, we can find the influential nodes in the networks, focus on them, protect and regulate them, so as to improve the reliability of the network. For example, in the transmission of infectious diseases, we can determine “super communicators”, predict the trend and scope of virus transmission, and provide a basis for virus prevention and control; in the traffic network, we can evaluate the traffic congestion to further ensure the connectivity of the network based on influential nodes. And the method can be applied to all kinds of networks, whether it is directed or undirected, weighted or unweighted. Therefore, nowadays, our research is of great significance in complex networks.
The rest of the paper is organized as follows. In Section
As stated above, a lot of researches have been carried out to identify influential nodes in complex networks and many centrality methods have also been proposed. These centrality methods show their advantages and superiorities together with some shortcomings and limitations. Based on these researches, in this paper we utilize the idea of the multi-attribute decision-making and propose a novel centrality method of identifying the influential nodes in the complex networks based on entropy weight method and gravity law.
Firstly, we summarize four important characteristics relating to the influence of node, namely local topological characteristic, central location of nodes, propagation characteristics and properties of neighbors. Then we regard seven classical centrality methods selected as the attributes of nodes, and according to the expression meanings of different centrality methods, we classify seven attributes as four categories.
Secondly, based on the idea of the multi-attribute decision-making, we select different attributes to form different combinations according to Table
Combination 1: DC, CC, BC, EC;
Combination 2: DC, CC, BC, PageRank;
Combination 3: DC, k-shell, BC, EC;
Combination 4: DC, k-shell, BC, PageRank;
Combination 5: H-index, CC, BC, EC;
Combination 6: H-index, CC, BC, PageRank;
Combination 7: H-index, k-shell, BC, EC;
Combination 8: H-index, k-shell, BC, PageRank;
Combination 9: DC, H-index, CC, k-shell, BC, EC, PageRank.
Then, because the influences of different attributes on nodes are often different, we use the entropy weight method to calculate the weight of each attribute in each combination quantitatively. Generally speaking, the smaller the information entropy of an index, the larger the variation of the index is and the larger its weight. By weighted averaging, we obtain the combined centrality of each node in each combination. The specific calculating steps are as follows.
Assuming that there are N nodes in the network, the set of research objects can be expressed as X = {x1,x2,…,xN}. If there are m centrality methods, the set of attributes of nodes can be expressed as A = {a1,a2,…,am}.
In the decision matrix P, xij (i = 1,2,…, N; j = 1,2,…, m) represents the j-th attribute of the i-th node.
In the formula, j = 1,2,…,m, K = 1/lnN;
On the whole, our method has several obvious advantages. Compared with other centrality methods, such as DC, BC, and CC, firstly, in our method more attention is paid to the multiple characteristics of nodes. Secondly, the attributes of nodes are not combined randomly. According to the four kinds of main characteristics of nodes, we divide several basic centrality methods into four groups and design different combination schemes. Then, we use entropy weight method to weigh the attributes in each combination quantitatively. In contrast to other similar multi-attribute methods, our method emphasizes the differences of different attributes on nodes, and also avoids subjective weighting. Besides, the influence of neighbor nodes on the target nodes is abstractly expressed by the “gravity” between nodes. In other gravity centrality methods, only the nearest neighbor is considered, however, in our method the nearest, the second nearest and the third nearest neighbors are all considered. Last but not least, by comparing the results of different combinations, we choose the optimal combination as the basis of identifying influential nodes, thereby increasing the accuracy of ranking results.
We chose six real and typical networks to prove the validity and accuracy of our proposed method. The detailed descriptions for six networks are as follows: (i) Friendship:[37] This directed network represents the friendship between students. Nodes represent students and the edge weights indicate interactions. (ii) Advogato:[38] This is the trust network of Advogato. Nodes are the users of Advogato and the directed edges represent trust relationships. (iii) Trust:[39] This is a network of who-trusts-whom relationships among users of the Bitcoin Alpha platform. A weighted edge from i to j represents trust ratings. (iv) Collaboration:[40] This network is a collaboration network. Nodes are the authors and edges are joint work between authors. (v) Airlines:[41] This is the directed network of flights between US airports. Nodes represent airports and edges represent connections from one airport to another. (vi) WormNet:[42] This is a network by integration of all data-type-specific networks by modified Bayesian integration. Nodes represent the genes and edges represent lines. Several statistical features of six networks are listed in Table
In order to compare the ability to identify the influential nodes in complex networks by using different centrality methods, we use standard SIR[36] model to simulate the spreading process and evaluate the ability of different nodes to propagate. In the SIR model, each node has three states, i.e., (i) susceptible state, S(t) is used to represent the number of individuals that are susceptible to (not yet infected) the disease; (ii) infected state, I(t) denotes the number of individuals which have been confirmed to be infected and can spread the disease to other susceptible individuals; (iii) recovered state, R(t) denotes the number of previously infected individuals that have recovered already and will never be infected any more. In each experiment, only one node is infected at the initial time, and the others are the susceptible. At each time step, the infected node randomly infects its susceptible neighbors with the probability λ, and the infected node becomes the recovered with the probability β. When there are no more infected nodes in the network, the spreading process ends. At the moment t, the total number of infected and recovered nodes in the network is denoted as F(t), which can be used to evaluate the transmission capacity of the initial infected node at time t. Obviously, in the ranking list, the more influential the node, the larger the value of F(t) at time t will be. For each initial infected node, we carry on 300 simulations and take the average as the final simulation data.
In identifying influential nodes in complex networks, many nodes may have the same centrality values, i.e., ranking values. For example, nodes in the same layer have the same k-shell value; nodes with the same neighbors have the same degree value. So it is difficult to distinguish the differences among these nodes. Here, we use the complementary cumulative distribution function (CCDF)[43] to analyze the distribution of influential nodes, and then make a comparison of advantage and disadvantage among various centrality methods. For good centrality methods, different nodes should have different ranking values, so the distribution of CCDF is relatively decentralized and the ranking range is larger than those of other methods.
To assess the performances of different centrality methods, we use Kendall’s tau coefficient[44–47] to analyze and verify the simulation results. The Kendall’s tau coefficient is an index used to reflect the correlation between two ordered sequences. Here, a series of influential nodes is obtained based on the centrality values and another one is based on the SIR simulation results. The higher the value of Kendall’s tau coefficient, the higher the correlation between the two ordered sequences is, that is, the more accurate the ranking list of the influential nodes generated by centrality methods is.
The imprecision function is another method of assessing the accuracy of centrality methods, which is proposed by Kitsak et al.[7,48] and modified by Liu et al.[48,49] Based on the sequence of influential nodes generated by the centrality method and the sequence generated by the real SIR numerical simulation, the imprecision function is used to analyze the difference in average spreading capability between the top M nodes in two ordered sequences. The definition of this function is as follows:
Based on the idea of the multi-attribute decision-making, in Section
According to the definition of Kendall’s tau coefficient, the larger the coefficient τ, the more accurate the ranking list of influential nodes is. We use Kendall’s tau coefficient to analyze the correlations between node spreading capability F(t) and centrality values of nodes in six real networks. The results are shown in Fig.
In addition, the value τ of combination 9 (com9) is not the largest in the six networks, that is, when all attributes are considered, the ranking result is not optimal. In general, combination 6 (com6) has the larger value of τ and more accurate ranking results in identifying influential nodes.
In the above, by comparing the coefficient τ, we know that the combination 6 is better than other combinations. Here, we choose the imprecise function to further prove this conclusion. The smaller the imprecise function ε, the more accurate the ranking result is. The results are shown in Fig.
In Fig.
Through the researches in Subsection
We use the comprehensive cumulative distribution function (CCDF) to compare the distribution of the ranking values for revealing the distinction of node between the optimal combination and seven basic centrality methods. The wider the range of ranking list, the better the centrality method is. The results are shown in Fig.
A good centrality method should not only distinguish nodes as much as possible, but also rank influential nodes as accurately as possible. It is obvious that the curve generated by combination 6 is higher than those generated by other centrality methods especially in Figs.
Thus these traditional centrality methods are hard to apply to the most networks with different structures. Therefore, our method has more stable and accurate performances.
According to the definition of the imprecision function, the smaller the value of ε, the more accurate the centrality method is. In Fig.
In this paper, we proposed a novel centrality method of identifying influential nodes in the complex networks. Considering the shortcomings of the traditional centrality methods, we adopt the idea of multi-attribute decision-making to comprehensively consider the multiple characteristics of nodes, and use entropy weight method to quantitatively weigh different attributes to obtain the combined centrality of nodes. Then, according to the gravity law, we replace the mass with the combined centrality to calculate the ‘gravity’ of the target node, and take this ‘gravity’ as the basis for identifying influential nodes. In order to evaluate the performances, we apply our method to six real networks and use SIR model to simulate the spreading process. By calculating the comprehensive cumulative distribution function (CCDF), we find that the proposed method can better distinguish nodes. Then, by calculating Kendall’s tau coefficient τ and the imprecision function ε, we find that ranking results of influential nodes generated by proposed method are more stable and accurate in different complex networks.
After that, further efforts are needed to improve the proposed method. Firstly, the selection of the basic attributes of nodes is relatively subjective. So how to rationally select attributes according to the real situation of networks is very necessary. Then, we use the classical SIR model to simulate spreading process. However, in the real-society networks with complex structures, nodes do not completely evolve according to the SIR model. Therefore, a more accurate and appropriate numerical model may be one of the important challenges and breakthroughs in the research of identifying influential nodes in complex networks.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] | |
[44] | |
[45] | |
[46] | |
[47] | |
[48] | |
[49] |